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It is often difficult or expensive to measure cutoff calls, which are 
usually caused by failures and malfunctions in some component of 
the telephone network. Therefore, it is desirable to have an indirect 
method for estimating the number of cutoff calls caused by equipment 
failures in a switching system or facility. This paper discusses a 
mathematical model that can be used to determine the cutoff call rate 
in a network component as a function of the failure modes and failure 
rates in the component, and the call holding time distribution. It 
includes a discussion of a paradigm for developing reliability objec- 
tives that directly reflect service as it is seen by end users. The 
mathematical model, an M/M/c/c queuing system with server fail- 
ures, is described. A strong law of large numbers and a central limit 
theorem for the number of cutoff calls— accumulated either according 
to the number of failures or over time — are developed. An example 
from a switching system is given to show how these results are applied 
in specific cases. 



I. INTRODUCTION AND SUMMARY 

The purpose of this paper is to describe a mathematical model for 
the rate of cutoff calls caused by failures and malfunctions in telephone 
equipment. The cutoff call behavior of almost any piece of telephone 
equipment that serves callers can be analyzed using this technique, 
but the primary applications we have in mind are large integrated 
systems containing many components, such as switching systems and 
transmission systems (trunk groups). The model relates the rate of 
cutoff calls produced by failures in the equipment and its subsystems 
to the failure modes in the equipment, their severity and frequency of 
occurrence, and the call-holding-time distribution. 

The interaction of telephone call requests with service equipment 
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has often been successfully described using queuing models. Therefore, 
it seems reasonable that a study of the effects of equipment failures on 
the calls in a telephone system should be feasible within the context of 
the classical queuing models of telephony. This is the approach 
adopted here, with the additional feature that the servers may be 
unreliable and subject to failures of a kind that cause the customer (if 
any), in service at a position whose server fails, to be dropped from the 
system at the time of the failure. Forys and Messerli have previously 
studied trunk groups containing unreliable servers. 1 Their interest was 
in characterizing the effect on arriving calls of one or more short- 
holding-time (hence, very likely to be malfunctioning) trunks in the 
group, whereas here the interest is primarily in the effects of unreliable 
servers that may fail singly or together in groups, on customers who 
are already in service. 

The paper is divided into five sections. Section II contains a general 
discussion of reliability objectives as they apply to telephone equip- 
ment, and the paradigm for developing reliability objectives that 
directly reflect service as it is seen by the customer. We observe that 
the critical step that has been lacking is the ability to translate 
equipment reliability into rates of occurrence and duration of cus- 
tomer-perceivable problems, such as cutoff calls and network connec- 
tion failures, that are produced by failures and outages. 

In Section III, the structure of the mathematical models to be used 
is described. The basic structure is one of a queuing system with server 
failures, and, using this structure, the probability that a call in the 
system will be cut off is determined. The way one describes mathe- 
matically the system organization and failure modes is also covered in 
this section. The probability of cutoff can be computed under quite 
general conditions on the arrival process, the service times, and the 
queue discipline, because it depends only on what happens after the 
customer enters service. 

Section IV describes a more specialized queuing model, in the 
context of which certain limit laws for the cumulative number of cutoff 
calls can be obtained. This is the M/M/c blocking system with server 
failures, and both a strong law and a central limit theorem are obtained. 
The eventual use of these limit laws, as the basis for constructing 
statistical tests for determining compliance with objectives, is also 
briefly discussed. Section V is devoted to the single-server case, and 
explicit calculation of all parameters of interest. 

Finally, Section VI gives an example of the application of this theory 
to the estimation of cutoff call rates in a toll switching system. It is 
important to be able to do this kind of analysis because one may wish 
to predict cutoff call performance for a system that is still being 
designed. This technique is then an example of an indirect, albeit 
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approximate, method of estimating a cutoff call rate for which no 
satisfactory direct method may be available. 

Two appendices contain all proofs and other mathematical details 
that, otherwise placed, would interfere with the flow of the text. 

II. RELIABILITY OBJECTIVES AND CUSTOMER SERVICE 

2.1 General 

It is currently recognized that the most desirable way to specify 
performance and service objectives for telephone network equipment 
is to use, in addition to economic information, considerations of how 
the operation of this equipment affects service as it is seen by the 
customer. In order to do this for reliability objectives, we need to 
realize that customers do not perceive outages, failures, and malfunc- 
tions as such. They are aware of them only insofar as they cause 
service problems detectable by users who generally are not aware of 
the internal operations of the telephone network. To achieve the goal 
of determining equipment reliability objectives based on customer 
needs and expectations, then, the following steps are required: 

(i) Determine the customer-perceivable service effects of the reli- 
ability problems to be controlled. 

(ii) Determine the quantitative relationships between the fre- 
quency and duration of reliability problems in the system or equipment 
and the rates of occurrence and duration of the service effects found 
in the first step. 

{Hi) Use these relationships to translate the customer service objec- 
tives for the system, which control the customer-perceivable effects 
stemming from reliability problems, into internal reliability objectives. 

This paper focuses on the second step for a particular service effect: 
cutoff calls. 

2.2 Service effects 

From the customer's point of view, the primary service effects of 
failures and malfunctions are cutoff calls, ineffective attempts (network 
connection failures), isolation (line and toll), and transmission impair- 
ments (excessive loss, noise, etc.). Cutoff calls will be discussed at 
length below. Ineffective attempts, or network connection failures, can 
be caused by failures and malfunctions because the unavailability of a 
portion of the telephone network increases the network's blocking 
probability during the time this portion of the network is out of service. 
If the failed equipment is a customer's loop, or a part of the local 
central office that disables the customer's line functions, causing a 
customer to be unable to communicate with the local central office, 
the customer experiences line isolation for the duration of the failure. 
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If the failed equipment is a toll-connecting trunk group from a cus- 
tomer's local central office, the customer experiences toll isolation, 
meaning that toll calls to or from certain areas cannot be placed or 
received. 

Transmission impairments can be caused by malfunctions such as 
equipment operating outside tolerances. These phenomena are well- 
understood and measurement plans are in place to return relevant 
information about transmission problems to maintenance forces so 
that abnormal conditions may be corrected. These will not be discussed 
further. 

The rate of network connection failures and the duration of isola- 
tions are determined primarily by the duration of the outage. Thus, 
analysis of these service problems is helpful in determining reliability 
objectives and maintenance policies to limit outage duration. We will 
see below that the rate of cutoff calls is primarily driven by the rate of 
failures, so that analysis of cutoff calls is useful mainly in determining 
objectives for frequency of occurrence of outages. Of course, a compre- 
hensive strategy for reliability management should deal with these 
complementary facets of equipment reliability in a unified way, and 
maintenance (service restoration and equipment repair) policies play 
an important part here. An objective for frequency of occurrence of 
failures, together with a maintenance policy, implies a certain total 
outage time for the equipment. Similarly, an objective for total outage 
time, together with a service restoration and equipment repair strategy, 
limits the number of times outages may occur. Although this paper 
deals only with cutoff calls and frequency of occurrence of outages, it 
should be borne in mind that a unified approach to reliability objec- 
tives, combining considerations not only of cutoff calls and outage 
frequency but also of network connection failures and outage duration, 
is most desirable. 

2.3 Types of failures included 
2.3. 1 Causes of cutoff calls 

A cutoff call is a connected (stable) call that has been terminated 
other than by an on-hook by either party. The event of termination is 
sometimes referred to as a cutoff, for short (as is a call that is so 
affected). The terminology is intended to connote an unintentional, 
unexpected interruption. International Telegraph and Telephone Con- 
sultative Committee (ccitt) terminology refers to a cutoff-causing 
failure in a switching system as a "premature release malfunction in 
an exchange." 

Cutoff calls are caused by equipment failures (including recovery 
actions), and other external factors, such as radio fades and in-band 
talkoff (simulation of the 2600-Hz supervisory signal by a signal emit- 
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ted by one of the parties). The termination takes place at the instant 
the failure or other event begins, so the rate of cutoffs is influenced 
primarily by the rate of failures (this is demonstrated in eq. (7)). 
Cutoffs are related to reliability, then, just as ineffective attempts or 
network connection failures are related to availability. To determine 
the rate of cutoff calls seen by a telephone user, the cutoff call 
performances of individual switching and transmission systems are 
combined in a network model. A suitable model is one for the reliability 
of a series system consisting of switching systems and trunk groups. 

2.3.2 Scope of the model 

The reliability problems covered by the model are those of failure 
and repair of entire systems and parts of systems, and those failures 
and malfunctions that may not completely disable a system or subsys- 
tem, but that cut off calls when they occur. In the first case, systems 
and subsystems will be considered to be either operating properly and 
fully available for use, or not operating at all and unavailable. Cutoff 
calls caused by improper operation, or operation outside tolerances, of 
a system or subsystem can also be treated. The key notion is that any 
event that causes cutoff calls when it occurs can be called a "failure" 
for purposes of this discussion. The model can accommodate many 
different "failure" modes, as long as the occurrence times and severities 
of these events can be characterized sufficiently well that failure 
processes and cutoff impacts (Section 3.2) can be assigned. In partic- 
ular, the model could in principle include such events as radio fades 
and in-band talkoff as "failure modes." However, in studying cutoff 
calls as related to equipment reliability, this is not recommended, 
because these are external events, not caused by an equipment failure 
or malfunction which could be controlled by preventive or corrective 
action by the telephone company. 

As for causes of failure, for the model there is no restriction on the 
cause of the failure or malfunction. All that is required is that one be 
able to list the kinds of events that cause cutoffs, and describe proba- 
bilistically the times between incidents for each kind of event. The 
scope of this work encompasses all failures which lead to cutoff calls, 
regardless of cause, including hardware (component failure), software 
and firmware faults, human intervention errors, office database errors, 
and so on. 

2.4 Uses of the mathematical model 

This model finds three primary applications in system analysis and 
design. First, it can be used to make the translation which allows 
system cutoff call objectives to determine reliability objectives and 
maintenance policies for switching and transmission systems. Relia- 
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bility objectives should not be viewed as ends in themselves, but only 
as means by which objectives for those aspects of customer service 
that are affected by reliability problems can be met. Second, they have 
value as predictive tools. System designers can use the probability of 
cutoff as a figure of merit for hypothetical system designs, architec- 
tures, and reliability characteristics. Systems that have not yet been 
constructed can be compared for this aspect of service quality, and 
this comparison can be a factor in deciding among competing designs, 
for example. Its third major use is to provide a framework within 
which to perform statistical tests, based on observed cutoff call rates, 
to see whether objectives are being met. In systems where cutoff calls 
are not measured, the models enable inferences to be made about the 
cutoff call rate based on other kinds of data, such as reliability records 
of equipment failures and malfunctions. Since cutoff calls are often 
difficult or expensive to measure in a given system, these techniques 
provide another, perhaps more attractive, means of understanding this 
important service problem. 

III. MODEL DESCRIPTION AND PROBABILITY OF CUTOFF 

In this section, we discuss the structure of the mathematical model 
for cutoff calls and reliability of telephone equipment. It starts with an 
outline-like guide to the sequence of results which make up the 
mathematical model. As an aid to seeing where the details fit into the 
overall scheme, this guide can be referred to while reading the remain- 
der of the paper. A queuing model with server failures is covered, as is 
the organization of the servers and failure modes. Physical interpre- 
tation is given, and some probabilistic insights are added to help clarify 
the ideas. Finally, the probability that a call that has been accepted by 
the system will be cut off is computed. 

3. 1 Outline of results 
3. 1. 1 Relation of probability of cutoff to equipment reliability 

The first important result obtained is in Section 3.6, where the 
probability that a call that has been accepted by the system will be cut 
off is computed. This probability can be thought of as a figure of merit 
for the system in question, and can be computed under weak assump- 
tions about the arrival process, the holding times, and the interfailure 
times. However, the probability of cutoff, by itself, is not enough to 
give a good understanding of how a system will behave with respect to 
cutting off calls. In particular, there are two important questions on 
which knowing the probability of cutoff alone sheds no light. First, 
does the observed cutoff call rate have any relation to the probability 
of cutoff ? Second, what is the structure of the stochastic process which 
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counts the number of calls cut off in a time interval? How much 
variability can be expected in such a count, for example? 

3.1.2 Measurements and consistent estimation of the probability of 
cutoff 

Section IV is devoted to an exploration of these questions for a more 
specialized system, the M/M/c/c queue with server failures. In answer 
to the first question, Corollaries 5 and 6 show that the observed cutoff 
call rate converges to the probability of cutoff as given by eq. (8). This 
means that, in this case, measurements can be relied upon to consis- 
tently estimate the probability of cutoff, which may be controlled by 
an objective. Also, when a prediction about the cutoff probability in a 
new system is made, it can reasonably be expected that the cutoff call 
rate shown by the system in operation will approach the predicted 
value (subject, of course, to the quality of the inputs to the prediction). 

3.1.3 Asymptotic distribution of the number of cutoff calls 

In answer to the second question, Theorems 7 and 9 show that the 
number of cutoff calls is, when suitably normalized, asymptotically 
normally distributed. The asymptotic variance of the number of cutoff 
calls [Theorem 8(b)], together with the asymptotic normality, suggests 
the variability to be expected in the observed (normalized) number of 
cutoff calls: about 63 percent of observations fall within one standard 
deviation of the mean, etc. Finally, the asymptotic distribution of the 
number of cutoff calls could be used as the basis for a statistical test 
for determining whether the objective is being met, although this is 
not accomplished in this paper. 

3.2 Mathematical description of cutoff call model 

The equipment will be modeled as a c-server queuing system. Calls 
(requests for service) arrive at the system at times n, T2, • • • . Denote 
by r'n the time that the nth arrival enters service. If this is a blocking 
system and all servers are occupied at time t„, the nth arrival never 
enters service, and for later convenience, t'„ will be taken to be — oo in 
this case. Throughout Section III the arrival process may be any 
arbitrary point process. Each call has associated with it a (nonnegative) 
holding time that it wishes to spend using the resources of the system. 
It is assumed that a single call occupies only a single server in the 
system during its entire holding time (this will be important later in 
discussion of the organization of the failure modes). The holding times 
are denoted by Yi, Y2, • • • , and are taken to be mutually independent 
and identically distributed, and independent of the arrival process. 

So far, we have just described an ordinary queuing model. The 
additional feature that distinguishes the models including equipment 
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failure is that the servers may be unreliable. That is, at certain 
(random) times, all the servers, or certain groups of servers, may cease 
serving the customers at their positions, and the affected customers 
will be forced to depart prematurely from the system at these times. 
Adopting the natural physical terminology for the mathematical 
model, these customers will be said to have been "cut off." Suppose 
that there are m different failure modes in the system. That is, there 
are m different ways in which various groups of servers (and possibly 
all servers) can fail in such a way as to cause cutoffs at the instant the 
failure begins. Any particular server may be affected by many failure 
modes, and many different configurations of failed servers may be 
included in a single failure mode. For example, suppose a switching 
system having 1200 terminations (lines and trunks) is made up of ten 
identical units, each serving 120 terminations. Then this system has a 
failure mode at 120 servers (terminations) — this would not be counted 
as ten separate failure modes if all these units had the same failure 
characteristics. With each failure mode, associate a renewal process 
listing the times at which failures of this type occur. These m processes 
will be called "failure processes." Let F l be the distribution of the 
interrenewal times for the ith process, and let X, be the reciprocal of 
the mean time between renewals, XT 1 = Jo x dF'(x). Let the epochs in 
the ith failure process be denoted by S{, S2, • • • . It is assumed that 
these failure processes are mutually independent and independent of 
the arrival- and holding-time processes. The latter independence as- 
sumption is reasonable when the arrivals have no prior knowledge 
about the state of the system at the time of arrival. 

Also associated with the ith failure mode is a number p, between 
zero and one. The quantity p, represents the probability that a call in 
the system will be cut off when a failure of type i occurs, and is called 
the cutoff impact of failure mode i. The severity of a failure of type i 
is indicated byp,. If p. = 1 then the ith failure mode is an entire system 
failure, and, with probability one, all calls in service are cut off when 
such a failure occurs. If, on the other hand, p, is close to zero, then this 
describes a minor failure, and fewer calls will be cut off when such a 
failure occurs. We will take p, ¥* for every i since a failure mode with 
cutoff impact zero can be ignored. 

3.3 Correspondence with physical situation 

Imagine a call using the resources of some telephone system (for 
definiteness, say a switching system), in either the setup phase or the 
conversation (stable) phase. Many elements of the system are used to 
provide and maintain the conversation path that is the electrical 
connection from one side (incoming or originating) of the system to 
the other (outgoing or terminating). Failure of some of these elements 
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may cause the call to be dropped from the system without an on-hook 
by either party. In the queuing model, it is not these elements that are 
thought of as the servers. Rather, a single call is thought of as 
occupying a single server, such as a pair of terminations or a path 
through a system, which may be subject to being disabled by the 
failure of some of these elements. From this point of view, any 
particular server may be affected by several failure modes. 

3.4 Probabilistic interpretation 

Before turning to the computation of the probability that a call that 
has been accepted by the system will be cut off, the following proba- 
bilistic heuristics are offered as an aid to clarifying the idea of the 
model. 

The event that a call in the system is cut off can be conceptualized 
as a realization of a competition process. Suppose a call having holding 
time Y enters the system at time t. At the entrance time t, m clocks 
are set running, with the ith clock's running time having the distri- 
bution of the excess lifetime of the time between failures for the ith 
failure process at time t. If the holding time Y expires before any of 
the clocks run down, no failures occur and, hence, no cutoff can occur. 
If one of the clocks runs down first (say the y'th one), a biased coin 
(P{heads} = pj) is tossed. If the coin comes up heads, the call is cut 
off, and the experiment stops for this call. If the coin comes up tails, 
the call is not cut off, and the experiment continues, with the 7 th clock 
now running according to the distribution F J . For this call, the exper- 
iment stops either when it has been cut off or when it departs normally 
from the system. 

The computation, which is performed in the next section, follows 
this description by first determining the probability of no cutoff and 
then subtracting from one. 

3.5 Probability of cutoff 

With this section, we begin following the outline of Section 3.1. The 
sequence of results and their proofs is simply a mathematical transla- 
tion of the description given in Section 3.4. Lemma 1, while of inde- 
pendent interest, is used here only in establishing the main result of 
this section, which is Theorem 2. 

Lemma 1: Let [N(t): t > 0} be a renewal counting process with 
interrenewal time distribution F. Then for t, y > and k > 1, the 
probability that there are k renewals in the interval [t, t + y] is given 
by 

J g k (t - s, y)dMo(s), (1) 

Jo 
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where 

• u+y 



ru+y 

gk(u, y) = [Fk-i(u + y-x)-F k (u + y- x)]dF(x), (2) 

J u 

with F k the k-fold convolution of F with itself, F equals V, the 
standard right-continuous unit step function with jump at the origin, 
and Mo the augmented renewal /unction for the process. For k = 0, 
the probability that there are no renewals in this interval is given by 
J'o[l-F(t+ y-s)]dM (s). 

Theorem 2: Let Mq be the augmented renewal function for the defec- 
tive distribution (1 — Pi)F', 

M (x)= i (l-pd'FUx), (3) 

and let 

ru+y 

gi(u, y) = l- F l (u) - pi Mb(u + y- x)dF'(x). (4) 

J u 

Then the probability that a call entering the system at time t is cut off 
is given by \ 



1- [ | 5 J gdt - s, y)dM'o(s) 



dH(y). (5) 



In the limit as t approaches infinity, this becomes 



i- n 



i=i 



1 - hpi gi(u, y)du 



dH(y). (6) 



If the arrival process is independent of the remaining queuing and 
failure processes, the probability that the nth call will be cut off, given 
that it enters the system, can be computed by integrating eq. (5) 
against the distribution of j' n . In case all the failure processes are 
stationary Poisson processes, the probability of cutoff is constant and 
does not depend on the entrance time of the call. 
Corollary 3: Suppose F'(x) = 1 — e~ XiX for i = 1, • • • , m. Then every 
call in the system has probability of cutoff given by 

1 - f exp(- l \ iPi y\dH(y). (7) 

If, in addition, the call-holding-time distribution is exponential, 
H{y) = 1 - e'"*, the probability of cutoff reduces to 
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I KPi 



i=i 



0=^^ . (8) 

These are obtained by appropriate substitution in eq. (5). 

3.6 Discussion 

The probability that a call already in the system will be cut off has 
been computed for a queuing system with unreliable servers. The 
arrival process and queue discipline may be arbitrary; this is a reflec- 
tion of the fact that the event of cutoff depends only on what happens 
after the call enters the system. The limiting argument used to estab- 
lish eq. (6) can be carried out even if the arrival process depends on 
the service time process (as in systems with state-dependent arrival 
rates), although the probability that the nth call will be cut off is more 
difficult to compute in this case. We have assumed the service times 
are independent and identically distributed. This could be relaxed, but 
for most ordinary message telephone service applications it does not 
seem necessary to introduce this complication. As can be seen from 
eq. (7), great simplification results if it can be assumed that the failure 
processes are stationary Poisson processes. In practice, this assumption 
has often been used because, in studying large systems from a great 
distance, data that would enable one to characterize the failure proc- 
esses in the system in more detail are often not available. When the 
conditions that obtain in the physical situation are difficult to identify 
exactly, it may not be possible to determine the information needed to 
make successful application of a more general model. 

IV. A MARKOV MODEL AND SOME LIMIT LAWS 
4.1 Introduction 

In Section IV we deal, for a more specific queuing system, with the 
second two items in the outline in Section 3.1. There are many ways 
to particularize the general considerations discussed in Section III, 
depending on the underlying queuing model. For purposes of estima- 
tion of cutoff call rates in telephone systems, certainly it is desirable to 
allow the most general model possible. This might be a transient 
analysis of a queue in which, in addition to the exogenous arrivals, 
there may be feedback and retrials by rejected and cutoff customers, 
and general service and interfailure times. Unfortunately, analytic 
treatment of such a complicated model is not within reach. The 
asymptotic analysis of such general queues, even with perfectly reliable 
servers, is accomplished only approximately in many cases. 
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Here, instead, we will study about the simplest of stochastic models 
for this situation, the M/M/c/c queue with stationary Poisson failure 
processes. This decision results from informal consideration of the 
tradeoff between realism of description on one hand and possibility of 
successful execution of analysis on the other. Even in this simple case, 
there are many interesting difficulties. For example, solving numeri- 
cally the Chapman-Kolmogorov equations (Appendix A) for the invar- 
iant distribution of the embedded chain (Section 4.3) is likely to be 
easier than obtaining qualitative insight through analytic solution of 
these equations. No representation is hereby made that the Markovian 
assumptions are particularly accurate in representing reality, or that 
the asymptotic results obtained well describe transient behavior. Nev- 
ertheless, the assumptions are not such gross distortions of the physical 
situation that they render such models useless, and the study of 
simpler models has several important virtues to recommend it. Solu- 
tions can be obtained, the general features of the underlying situation 
remain visible without the technical details that sometimes obscure 
the main ideas, directions for the generalizations that are likely to be 
successful on more complicated models are suggested, and, last but not 
least, results can be checked against data to determine if more general 
models are required. The Markovian model to be described has been 
successfully used in the switching systems area, and predictions made 
from it have shown reasonable agreement with data. This is not to say 
that further refinements of these models would not be valuable. Such 
refinements would be interesting and useful advances in the state of 
the art. 

4.2 Specifications and notation 

In the M/M/c blocking system, let a be the arrival rate, v be the 
service rate, and let {A(t) : t > 0} denote the arrival process. The m 
failure processes are all stationary Poisson processes with rates 
Xii • • • i A m , all positive. (In the example in Section 3.2, the failure rate 
for the 120-termination failure mode would be ten times the failure 
rate of a single 120-termination unit.) The system will be assumed to 
recover instantaneously from failures, so that the only effect that a 
failure has is to cause some of the calls in the system to depart 
prematurely, before the completion of their intended holding times. 
Failures, therefore, have no effect on calls that are not already in the 
system. For example, they do not cause an increase in the blocking 
probability of the system. Clearly this is only an approximation to the 
true situation, but it seems to produce acceptable results, for several 
reasons. First, in practical cases, the ratio of average outage time to 
mean time between failures is usually small; here this small number 
has been replaced by zero. Secondly, in this approximation the total 
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number of cutoff calls tends to be overestimated because more calls 
are accepted into the system than would be if the failure durations 
were positive. This means that more calls are exposed to the possibility 
of being cut off. Again, if the times between failures are long compared 
to the outage times, the cutoff call rate (number of cutoff calls divided 
by number of arrivals or number of accepted calls) will not be badly 
distorted by this approximation. 

The failure processes interact with the queuing processes in the 
following way. Let B(t) denote the number of busy servers at time t, 
t > 0, including the effects of failures (as below), and let C(t, r) be the 
number of busy servers at time t in an ordinary (no server failures) 
M/M/c/c system when there are r in the system at time 0. Then, 
whenever a failure of type i occurs, the probability that a call in the 
system will be cut off is p„ and the cutting-off events for each of the 
calls in the system at that time are assumed to be mutually independ- 
ent, as are the cutting-off events corresponding to different failure 
times. (Simultaneous failures occur with probability zero since the 
distributions of the interfailure times are all continuous.) This models 
a situation in which the calls in service at any time are more or less 
regularly spread out over the servers in the system, and all parts of the 
system subject to a given failure mode are equally vulnerable. The 
independence, on the other hand, is invoked to reflect the fact that 
this regular distribution obtains only perhaps in a very broad, average 
way, and at any given failure epoch, the server occupancy might be 
quite irregular. At each epoch in each failure process, then, the number 
of calls cut off is a binomial random variable with parameters given by 
the number of busy servers at that epoch and the cutoff impact of that 
failure mode. That is, at time S' n , if B (SL) = k, the number of calls cut 
off is binomially distributed with parameters k and p,. Sometimes 
many of the calls in the system will be carried by the unit (group of 
servers) experiencing the failure; sometimes proportionately fewer calls 
will be carried on this unit. The binomial model provides an approxi- 
mate description of this situation. This is a compromise between a 
very detailed model that keeps track of individual server busy and idle 
times and the individual identities and times of failure of server groups, 
and a deterministic model having the number of cutoffs at S' n equal to 
PiB(S'n), which is unrealistic for being too regular. 

4.3 The embedded Markov chain 

As defined, B(t ) is a pure jump process; even with cutoffs caused by 
failures accounted for, all sample paths can be assumed to be contin- 
uous from the right. Pool the failure processes and denote the resulting 
stationary Poisson process by {Si, S 2 , •••}• Define B n = B{Sn) 
(n = 1, 2, • • •); B„ is the number of busy servers just before the nth 
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failure of any kind. The sequence {B n : n = 1, 2, • • • } is a Markov chain, 
called the embedded chain, with state space equal to {0, 1, ■ • • , c}. 
The survivors in the system at time Sn have the same exponentially 
distributed service times as new arrivals do, and their number is 
determined only from B„. The number of arrivals in [S n , S n +i] is 
independent of the number of arrivals before S„. Note that the strong 
Markov property is not required of the arrival process, for while the 
failure epochs are random times, they are not determined by the 
arrival process because of the assumed independence. 

4.4 Properties of the embedded chain 

Let W n denote the number of calls cut off by the failure that occurs 
at time S n . Then, for each n, the conditional distribution of W„, given 
B n , is a mixture of binomials: 

1m / i ' 

n a I " \ in/* \ b—W 



P{W n = w\B n = b) = - £ A, II pf (1 - Pi ) 

b = o, . . . , c, w - 0, • • • , b. (9) 

Here A./A is the probability that the nth event in the pooled process 
comes from failure process i (A = Ai + • • • + A m ). Denote the right-hand 
side of eq. (9) by qbw. 

Finally, note that the W„'s are conditionally independent, given the 
B n 's, because of the independence of the cutting-off events correspond- 
ing to different failure times. That is, 

P{ Wi, - II>1, • • • , W in = Wn\B i} - 6l, • • • , B in = b n ) 

= fl P{Wi h = w k \B ik = b k ) (10) 
k-\ 

for all positive integers n, ii, • • • , i n . 

The properties of the B„-process can be most readily obtained from 
the fundamental representation 

B n+ 1 = C(S n+1 - S n , B n " W n ), 

where the equality is equality in distribution. That is, the number of 
busy servers at (just before) S„+i has the same distribution as the 
number of busy servers in an ordinary (no server failures) M/M/c/c 
system running for time S n +i — S n with B n - W n (the number of 
survivors in the system at time Sn) calls in the system at time zero. It 
has already been observed that {B n :n = 1, 2, • • •} is a Markov chain; 
straightforward conditioning arguments and appeal to the indepen- 
dence of the failure and queuing processes establish that its transition 
probabilities are given by 
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P{B n +l = j\Bn=i) 

k 



k=0 r=l 



- Z Z ^ ( I |P'-*(1 - Pr) k I P{C(x, k) = y}e-^rfx. (11) 



These are independent of n, so the chain has stationary transition 
probabilities. Denote them by py. We remark that if p r = 1 for every 
r, these reduce to 



-1 



Pij = P{C(x,0)=j)Xe-" x dx, 



so that {Bn} are mutually independent in this case. Also, if the failure 
processes are not stationary Poisson, but are, say, renewal, then thep,; 
are still well-defined, although they take a different form. In particular, 
they then depend on n, and while {B n } is still a Markov process, it 
does not have stationary transition probabilities. Some of the following 
results (particularly those about recurrence) continue to hold in this 
case, but limit laws are harder to obtain. 
Riordan gives the distribution of C(x, k): 2 



j / c i 

P{C(x i k) = j)= p -[Y i p T} 

J l \i=o V- 






where p = ot/v, the D„ are related to the Poisson-Charlier polynomials 
c n (Ref. 3) by D n (s) = p n c n (s), and r h • • • , r c are the roots of 
D c (s + 1). These roots are all real and negative so that the e riVX all 
vanish as x —> oo, and the P{C(x, k) — j) approach the well-known 
Erlang equilibrium probabilities, independent of k. Equation (12) 
shows that P{C(x, k) = j) is an analytic function of x that is not 
identically zero, so that its zeros, if any, are isolated. Thus, there is a 
set of positive measure in [0, oo[ on which P{C(x, k) = j} > 0. This 
means that Jo P{C{x, k) = j) exp(-\x)dx is positive for every j and 
k, and so pu > for every i and/ This positivity shows the {£„} chain 
to be irreducible and aperiodic. Since the chain is finite, all states are 
positive recurrent (Ref. 4, Section I.XV.6). 

4.5 The induced Markov chain 

The two-dimensional process {(B„, W n ):n = 1, 2, • • •} is again a 
Markov chain whose transition probabilities are given by r Vj = 
QbjWjP^bj, where s, = (&,, w t ). That is, 

P{(B n+ u W n+i ) = Sj\ (Bn, W n ) = Si) = r s , s> = g^. p^- 



CUTOFF CALLS AND RELIABILITY 1875 



Use is made here of eq. (10). This chain will be called the induced 
chain. 

It is desirable for the induced chain to inherit the properties of the 
embedded chain discussed in Section 4.4. To obtain this, it would be 
sufficient to have qy > for all i and/ From eq. (9), this is satisfied, 
unless pi = 1 for every i. The case pi = 1 for every i is a trivial special 
case of what is to follow, because then W n = B„ with probability one, 
for every n. Also, for large systems with many failure modes, this case 
is of little interest. For these reasons, we will suppose that there is at 
least one i for which pi < 1. Under this condition, r S|Sy > for every i 
and j, and since the induced chain is also finite (its state space is 
{(b, w):b = 0, 1, • • • , c, W — 0, 1, • ■ • , b)), it is irreducible, aperiodic, 
and positive recurrent, just as the embedded chain was. 

4.6 Station arity 

Since service objectives represent long term goals for system oper- 
ation, it is appropriate to compare the equilibrium features of the 
model against the service objectives. 

Since both the embedded and induced chains are positive recurrent, 
they are both ergodic. The embedded chain has an invariant distri- 
bution {Uk'.k = 0, • • • , c} given by 

r (n) 

Uk = limp rt , 

n—too 

independent of i. As usual, the parenthesized superscript indicates 
the n-step transition probability. Furthermore, Uk > for each k, 
£*=o u k = 1, and u k = £?=o Utpu, (Ref. 4, Section I.XV.7). To say that 
the system has been in operation for a long time can be expressed by 
taking {«o, • • • , u c ) to be the distribution of the number of busy 
servers at time zero. With this choice of initial distribution, {B n } 
becomes a strictly stationary process. 

The induced chain also has an invariant distribution, denoted by 
{i><o,o), • • • , V(c,c)}. It is easy to see that V(b, W ) is given by V(b,w) ™ QbwUb, 
5 = 0, ■ • • , c; w — 0, • • • , b. The induced chain can also be made 
strictly stationary by taking its initial distribution to be its invariant 
distribution. 

4.7 A strong law of large numbers 

The quantity of basic interest in this study is the cumulative number 
of cutoff calls, X" = Wi + • • • + W n . This section is devoted to 
describing a strong law of large numbers for Xn and some of its 
ramifications. This addresses the second item in the outline of Section 
3.1. 

In general, { W n .n = 1, 2, • • •} is not a Markov process. However, it 
can be written as a functional of the induced chain. The appropriate 
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functional to choose is 772, the projection onto the second coordinate: 
W„ = TT2{B n , W n ) for each n. tt 2 is clearly a measurable function on the 
a-field of the induced chain, and so the limit theorems of Sections V.5 
and V.7 of Ref. 5 may be applied to { W„). 
Let Z(t) be the number of calls accepted by the system in [0, f\, 

z(t) = i va - T & 

and put Z n =Z(Sn). 

Theorem 4: X"/ n converges with probability one to 01im (EZ„/n). 

Corollary 5: x n IZ n converges to in expectation and with probability 
one. 

The proofs of these results can be found in Appendix B. 

Now this is not quite what is required for applications. Generally, 
one does not count either carried calls or cutoff calls indexed by the 
times of failure Si, S2, • • • . Rather, what one does is keep a running 
count of these items indexed by a continuous time parameter. Accord- 
ingly, let x(0 denote the total number of calls cut off in [0, t~\; one has 

x ( t ) = £ WnV(t - S n ) = Xmax {n :S n ^). 
n=l 

Corollary 6: x(^)/^^) converges to 6 in expectation and with prob- 
ability one. 

Applications of these results have been discussed in Section 2.1. In 
a stable Markovian environment, Corollary 6 says that the natural 
estimator of the probability of cutoff in the system, namely the cutoff 
call rate, is strongly consistent. The implication for measurement is 
that for systems in operation, measurements can be relied upon to 
estimate the underlying cutoff call rate that is characteristic of the 
system. The extension of these results to other than Markovian queues 
would provide even better approximations when the environment can 
be more precisely specified. The implication for system design is that 
once it is configured with certain failure modes, etc., its cutoff call rate, 
in the appropriate environment, will be as predicted, subject to sets of 
probability zero and the quality of the failure rate predictions. 

Before turning to central limit theorems, a partial indication of the 
rate of approach to steady state will be given. 6 For this purpose, 
assume that c = 00 (so that all arriving calls immediately enter service) 
and that p, = 1 for all i (so that every time a failure occurs, all calls in 
the system are cut off). Then it can be shown that 
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1 _ e -fr+»)f 
1- 



(13) 



(A + v)t 
Eq. (13) can be used to estimate relative errors after different times. 

^__ _ E ( x(t) \ 
\ + p \A(t)J 

'x(t) 



Let R(t) = 



X + v 



-1 r 



; then 100R(t) is the 



percentage error in E I J as an estimate of 8 after t time units. 

Using eq. (13), 

R(t) = e~ xt + ,. * - (1 - e- M )(l - e~ {X+ » u ). (14) 

(A + v)t 

Measuring time in minutes, with X = 0.003 (about three failures per 
day) and v = 0.166 (six-minute average call holding time), the percent- 
age errors, from eq. (14), are 85 after one hour, 35 after six hours, 12 
after 12 hours, and 2 after 24 hours. 

4.8 A central limit theorem 

The existence of an asymptotic normal approximation for the cu- 
mulative number of cutoff calls makes the construction of statistical 
tests easier. In this section, we discuss these approximations in discrete 
and continuous time. This addresses the third item in the outline of 
Section 3.1. 

The central limit theorem for x* follows directly from the central 
limit theorem for functionals defined on a Markov chain, for example, 
see Theorem V.7.5 in Ref. 5. 
Theorem 7: There are positive numbers fi and a for which 

n — ° L aV/i J 

where <&(x) is the standard normal integral. 

This requires little discussion: the condition (Do) and the moment 
condition of theorem V.7.5 of Ref. 5 are satisfied because the induced 
chain is finite and positive recurrent. The interesting results are the 
values of the centering and scale parameters. It is easy to see that 

m X m X e m A c h 

l i = EW 1 =l iPiEB, = I -Jpi I bu b =l iPi I , (15) 

,=1 A ,=i A 6=o t-l A b=0 fflbb 

where m a b is the mean first passage time from state a to state b in the 
embedded chain. (If a = b, this is a mean recurrence time). 
Theorem 8(a): The asymptotic variance of the partial sums of the 
Bk's is 
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■ta!v«(£&)-zi-=*-(=S-*-H-. as) 

n—oc It \k-l ) a-0 b-0 niaafnbb \nibb J b-OWbb 

where mfl is the second moment of the recurrence time for state b in 

the embedded chain. 

Theorem 8(6): The scale constant in the central limit theorem is 



m \ c h m A c h 2 

,=1 A 6=0 TTlbb i=l A 6=o m.bb 



+(z£*Yiz=^(^ -*«-)■ <"> 



,_1 A / a=0 6=0 Tn aa m,bb \m,bb 

We have written the centering and scale parameters in terms of the 
moments of the recurrence times for the embedded chain. These can 
be found by solving for the invariant distribution of the embedded 
chain (Appendix A). The mean recurrence times are then just the 
reciprocals of the elements of the invariant distribution, and the second 
moments can be obtained from the first moments by using Theorem 
1.11.7 of Ref. 7. The mean first passage times my can be found by 
solving another system of linear equations, for example see Theorem 
6-7A of Ref. 8. For even moderate values of c, it appears that the 
wisest thing to do in applications is to solve the system of eqs. (25) 
numerically. The single-server case is treated explicitly in the next 
section, and it can be seen that even in this case, the computations are 
extensive. 

In continuous time, the central limit theorem looks slightly different. 
This is because counting the number of cutoffs according to the 
number of failures, rather than over time, introduces a random time 
transformation with scale A. 

Theorem 9: The distribution of the normalized cumulative number 
of cutoff calls over time, 

x(*) " V< 



a^Xt 



(18) 



converges weakly to the cumulative distribution function {cdf) of a 
normal random variable having mean zero and variance 1 + p /a . 

V. THE SINGLE-SERVER SYSTEM 

In this section we discuss in detail the results of the previous sections 
as they apply to the single-server system. We will explicitly solve for 
the invariant distributions and, thereby, be able to represent the 
parameters of the limit laws in terms of the arrival, service, and failure 

rates. 

If there is only a single server, we will suppose that there is only one 
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Poo = 

/>01 = 



failure mode, of rate A and cutoff impact p. Certainly if all failures are 
complete failures, p = 1. We can allowp < 1 to account for malfunctions 
which may only sometimes cut off calls. Other failure modes with 
other severities could be allowed. Solving the Chapman-Kolmogorov 
system of eqs. (25) and making use of Theorem 10 we obtain 

p\ + v a 

"° = r ° = ~r~, — ; — > Ul = ri = -~r~; — ; — • < 19 ) 

p\ + v + a p\ + v + a 

From eqs. (11) and (12) we obtain the transition probabilities 

p + \ 
\ + p + a' 

a 
\+v + a' (20) 

v + p\ 
A + v + a 
(1 - p)\ + a 

The mean first passage and recurrence times can then be obtained as 
indicated in Section 4.8: 

pX + v + a 

moo= r— , 

p\ + v 

\ + v + a 

moi = , 

a (21) 

\ + v + a 

"llO r— , 

p\ + V 

p\ + v + a 

win = . 

a 

Let ah be the variance of the recurrence time for state 1. By using 
Theorem 1.11.7 of Ref. 7, we obtain 

2 (p\ + v )((2-p)X + v + a) 

on = 5 • \A&) 

a 

Since <x 2 from Theorem 8(b) reduces to pou/nth in case c = 1, we 
obtain 



, _ pa(p\ + y)((2 - p)\ + p + a) 

(p\ + v + a) 3 



o 2 - ^^- ■//": " ; - ; - 7 . (23) 



This is the scale constant for Theorem 7. The centering constant is 

Pa 
** p\ + p + a ' 
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VI. APPLICATIONS 

These results have been applied at Bell Laboratories to predict the 
cutoff call performance of certain toll switching systems, and to eval- 
uate reliability objectives for these systems on the basis of a determi- 
nation of whether the cutoff call objective for the system can be met 
if these reliability objectives are followed. 

In one such example, a system terminating 22,000 trunks was con- 
sidered. Thirteen failure modes that were significant for cutoff calls in 
the system were identified. Table I lists, for each failure mode, the size 
of the unit failing or the number of terminations affected by the failure, 
the failure rate expressed as a mean number of failures per year, and 
the cutoff impact. For most of the failure modes, there was more than 
one type of unit or subsystem of the given size. The failure rates of all 
the units or subsystems of a single size were added together to obtain 
the failure rate for that failure mode. This is done because we are going 
to assume uniform distribution of calls over terminations, as discussed 
in Section 4.2. If more precise information on the distribution of calls 
over terminations or location of failed units is available, it may be 
more reasonable not to pool, but to carry individual information, as 
appropriate. 

Every stable call in the system must occupy two terminations, one 
incoming and one outgoing. For a particular call, the failed unit or 
subsystem may be on the incoming side of the switch, the outgoing 
side, or both, or neither. Then the estimation of the cutoff impact of 
a failure mode is like a problem in sampling without replacement in 
which one counts the number of paths through the switch that contain 
the failed unit or subsystem. If the total number of terminations on 
the switch is N and the number of terminations affected by a failure 
of type i is m, then the cutoff impact for failure mode i is 

Table I — Failure modes, frequencies, and cutoff 
impacts for example in Section VI 



Failure 


Terminations 






Mode 


Affected 


Failures per Year 


Cutoff Impact 


1 


22,000 


0.248 


1.0 


2 


5,500 


0.195 


0.438 


3 


4,080 


0.077 


0.337 


4 


2,040 


0.0004 


0.177 


5 


1,920 


0.355 


0.167 


6 


840 


0.482 


0.075 


7 


512 


10.819 


0.046 


8 


128 


66.667 


0.012 


9 


120 


0.263 


0.011 


10 


32 


22.727 


0.003 


11 


16 


20.0 


0.0015 


12 


8 


217.391 


0.0007 


13 


1 


1030.0 


0.0001 
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Pi= N(N-l) ■ m 

Using eq. (8) we find that in a Markovian environment, the probability 
that a call entering the system will be cut off because of one of these 
failures is 0.24 X 10" 4 , when the mean call-holding time is six minutes. 
Based on this, it was concluded that a sufficient margin of safety 
existed to ensure that the system's cutoff call objective would be met, 
even after allowing for possible errors in the specification of failure 
modes and rates, and other possibilities that could not be accounted 
for in the analysis. 
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APPENDIX A 

The Invariant Distributions in Discrete and Continuous Time 

As pointed out in Section 4.8, the centering and scale constants for 
the strong law and the central limit theorem are all written in terms of 
the mean first passage and recurrence times for the {B n } process. It 
appears from eqs. (11) and (12) that use of theorems 1.7.1 and 1.6.1 of 
Ref. 7 to find the invariant distribution of {B n } will require significant 
effort. In this appendix, we will derive the Chapman-Kolmogorov 
equations for the {B(t)} process. Finding the invariant distribution of 
the {B (t)) process by solving these equations is easier than solving for 
the invariant distribution of the discrete-time process using the tran- 
sition probabilities in eq. (11). It is also a more attractive procedure 
numerically, because the matrix of coefficients is upper triangular with 
only a single nonzero subdiagonal, consisting of all a's. Finally, these 
results are tied together by Theorem 10, which indicates that these 
two invariant distributions are identical. 

Let r n (t) = P{B(t) = n). Then for h > 0, we can write r n (t + h) = 
2U P{B(t + h) = n\B(t) = k, Sj<£ [t, t + h], Vj)P{B(t) = k, Sj$ 
[t, t + h], \fj) + £*=„ P{B(t + h) = n\B(t) = k, Sj E[t,t + h], 3j) 
P{B(t) = k, Sj E[t,t + h], 3j). 

To simplify the following display, in the first sum, all terms involving 
both an arrival and a departure in [t, t + h] have an h 2 in them, and 
so can be left off. Similarly, in the second sum, because of the \h that 
will appear in front, all terms involving either an arrival or a departure 
can be left off. We obtain, omitting terms o (h) or higher, 
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r (t + h) = (1 - Xh)d - ah)[r Q (t) + vhMt)] + Xh I g**r*(f), 

A-0 

r„(* + A) = (1 - \h)(l - ah)[(l - nvh)r n (t) + (n + l)vhr n+ i(t)] 

c 

+ (1 - \h)ahr n -i(t) + Xh £ qk*-nr k (t), 1 < n < c - 1, 

A=n 

r c (* + /i) = (1 - XA)[(1 - cvh)r c (t) + ofcr c -,(*)] + XA(1 - cph)q c or c (t). 

Collecting terms, simplifying, dividing by A, and letting A -» 0, we 
obtain 

c 

r ^t) = -ar (t) + vri(t) - X[r (t) - £ 9**r*(*)], 

A=0 

r,I(0 = -(a + w)r„U) + ar„-i(«) + (n + l)w«+i(*) 

c 

- \[r„(0 - X qk*- n r k (t)l 1 < n < c - 1, 

A-n 

r c '(0 = -cw c (*) + arc-iU) - X[r c (t) - q c o r c (t)]. 
In equilibrium, we look for solutions with r, = lim P{B(t) = j) and 
lim rj(t) = 0. Then these equations become 

c 

= —(a + X)r + vr\ + X £ ^aa^a 

A-0 

= ar n -i - (a + nv + X)r n + (n + l)vr n +\ 

+ X i qu-nn, 1 < n < c - 1 (25) 

A=n 

= ar c -i — (cv + X — Xq c o)r c 
1=1 r k , 

A-0 

where the condition that {r , • • • , r c ) be a probability distribution has 
been added. These are the equations used to solve for the invariant 
distribution of the continuous-time process. Writing p = a/v, X' = 
X/v, and r = (r , • • • , r c ) T , the first c + 1 equations can be written in 
matrix form as 

[M{ P ) + A'Q]r = 0, 

where M(p) is the standard matrix for the M/M/c/c birth-death 
process (Ref. 9, Section 2.1), and 

CUTOFF CALLS AND RELIABILITY 1883 






Qu 




922 


• 


g cc 





<7io — 


1 


^21 


• 


q c .c-\ 










920- 1 


• 


qcc-2 




• 







• 






• 




• 





q c \ 


o 













qco- 



<H 



The equations in this form show clearly that when A = 0, we recover 
the ordinary M/M/c/c system, as expected. The M(p) matrix is tridi- 
agonal and Q is upper triangular, leading to the attractive form for 
numerical work mentioned above. 

It remains to show that the two invariant distributions, for contin- 
uous time and for discrete time, are identical. 
Theorem 10: rj = u,-, j — 0, • • • , c. 

Proof. Define B*(t ) = £"=i B n I(S n <t< S n+1 ), where / denotes the 
indicator function. Since {S„} is a Poisson process, B*(t) is a Markov 
process which will be thought of as a semi-Markov process embedded 
in the continuous-time busy server process. The distribution of the 
time between transitions in this process is exponential (X), regardless 
of the starting state, and so the expected time to the next transition, 
starting from state i, is 1/A for every i. From Section 6.3 (ii) of Ref. 10 
we obtain that lim,_.. P{B*(t) =j\B*(0) = i}= uj for j = 0, • • • , c. 
Next, the distribution of the time from an arbitrary epoch back to the 
most recent failure is also exponential(A), so that using Section 6.3(iv) 
of Ref. 10, we obtain 

rj = £ Ui : I P{B(S n + t)=j\B n = i)Xe- x 'dt, 



regardless of the value of n because of the stationarity of {B„}. For t 
with S n + t < S n +h one gets B(S n + t) =j by having k survivors in the 
system at time Sn and letting the M/M/c/c system evolve from there 
(k - 0, 1, • • • , i). This has probability X*=o q U - k P{C(t, k) =j), so 

rj = I Ui i q,,-k P{C(t, k) =j)\e- xt dt = i uipij = Uj. ■ 

,=o *=o L i=o 



APPENDIX B 
Proofs 

In this appendix, we provide proofs for Lemma 1, Theorems 2 and 4, 
Corollaries 5 and 6, and Theorems 8 and 9. The blot symbol ■ signifies 
the end of a proof. 
Proof of Lemma 1: For t, y > and k > 2, begin by writing 
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P{N(t + y)- N(t) = k} = % P{N(t + y)=k+j, N(t) =j) 



j-Q 



= 1 P{Sk +J <t + y< S k+j+h Sj<t< S J+ i), 

where the interrenewal times are X U X2, • • • , S n = X\ + • • • + X n , and 
So = 0. Now condition on S, = s, Xj+i = x, and Xj+2 + • • • + Xj+k = u. 
Using the independence and identical distribution of the interrenewal 
times, together with some algebraic simplification, leads to eqs. (1) and 
(2). The sum and integral can be interchanged because of the uniform 
convergence of the renewal function on compact intervals. The proof 
is similar for the cases k = and 1. ■ 

Proof of Theorem 2: Let Ni(t) be the renewal counting process for 
failure mode i, and let T n stand for the event that the nth arriving call 
is accepted into the system and survives to the end of its intended 
holding time without being cut off. Then there is a version of 
P{T n \Y n = y, r'n = t) that is given by 

S ... £ P{T n \Ni(t + y)-N i (t)=k i ,i=l,...,m) 

ft,=0 A„,=0 

•P{N,(t + y)- Ni(t) = ki, i = 1, • • • , m) 

oo oo m 

- I ••• 1 II (1 - Pd ki P{Ni(t + y) - Ni(t) = ki) 

k,=0 k„=0 1=1 

oo oc m C 

k. 



= I '•• 1 nd-Pi)' g'kM-s,y)dMb(s) 



k,=0 



= 11 I (1-PiV g l k(t-s,y)dMh(s), 

«=1 A=0 J 

where the superscript i on the g indicates the function from eq. (2) 
which belongs to failure mode i. Now insert the expression for g\ from 
Lemma 1, simplify, and use eq. (4). This leads to the desired conditional 
probability's being given by 

m f 

II gi(t-s,y)dMb(s). 

•-1 Jo 

Equation (5) is then obtained by unconditioning on the holding-time 
distribution and subtracting from one to obtain the probability of 
cutoff. To obtain eq. (6), first observe that since gi is directly Riemann 
integrable, the basic renewal theorem (Ref. 4, Section II.XI.l) applies, 
yielding 
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lim gi(t - s, y)dMb(s) = A, g,{u, y)du 

'— Jo Jo 

- 1 - \tpi I Mo(w + J - x)dF i (x)du. 

JO Ju 

The Lebesgue bounded convergence theorem then allows the inter- 
change of limit with the integrals in eq. (5), yielding eq. (6). ■ 
Proof of Theorem 4: The existence of the a.s. limit as n -* oo of Xn/n 
follows from standard results about regenerative processes. These 
results, in a Markov chain setting (e.g., Theorem 1.15.2 of Ref. 7), show 
that xJn converges w.p. 1 to 

c b 

S S WV( h ,wh 
6=0 oi=0 

which, upon reversing order of summation, is seen to be equal to E W\ 

since {W n } is a stationary process. Note that one also has EW\ = 

lim„_„ Exn/n. To complete the proof, straightforward calculations 

show that Exn = 0EZ n , and that lim„_oo EZJn exists. ■ 

Proof of Corollary 5: Write Xn/%n = (xn/n)(n/Z n ) to obtain the 

result. I 

Proof of Corollary 6: Use theorem 8.1 of Ref. 12. ■ 

Proof of Theorem 8(a): Let V,(n) denote the number of visits to state 

j in the first n transitions of the embedded chain. Then 



S B k = £ jVAn), 

k=l y=0 



and it follows that 



Bi = S JVjd) and B k =l j[Vj(k) - Vj(k - 1)], k > 2. (26) 
>=o y-o 

Using Lemma 7.3 of Ref. 5 and the stationarity of the embedded chain, 
our first step is 

lim - Var ( £ B k ) = Var Bi + 2 £ [£#,-6* - (JBBi) 8 ]. (27) 

The variance of jBi is easily seen to be 

c b 2 c c ab 
VarB,= I— - I 2-^-. (28) 

6_o mbb a =o 6=0 m aa mbb 

For the second term, use the representation in eq. (26), exchange order 
of summation, and sum by parts to obtain 

I [EBiBk - (EB,) 2 ] 

k~2 



1886 THE BELL SYSTEM TECHNICAL JOURNAL, OCTOBER 1981 



i-O ;=0 *- 



= J I ij lim E[Vi(l)Vj{R) - Vi(l) V y <l)] - 



R- 1 

m u m l! 



(29) 



To simplify this, observe that EV,(1) V,(l) = EVi(l) 2 when; = i and is 
zero otherwise. Also, P{ Vi(l) = *} = 1 - u, for * = 0, it equals m, for 
* - 1, and is zero for x > 2, so that EVi(l) 2 = «, = l/m„. Equation (29) 
becomes 

c .-2 



£ i V Mm 



EVi(l)Vj(R)- 



R-l 



mum. 



-Si- 

;=0 /Wij 



(30) 



Further simplifying, observe that 

EVi(l)Vj(R) = E(Vj(R) | Vi(l) = l)P{Vi(l) = 1} 

= E(Vj(R)\Bi = i)ui, 

so that the limit to be evaluated in eq. (30) becomes, after factoring 
out the common term l/m„, 



lim 

fl-.°o 



E(Vj{R)\Bi = i)- 



R-l 



771 1 



(31) 



Now, letting / stand for the indicator function, Vj(R) = ££=i 
/(fin =», so that E(Vj{R)\Bi = i) = j£3p§. When; = i we obtain 
immediately, using Theorem 1.6.5 of Ref. 7, that the limit in eq. (31) is 
given by 

mf + ma 



2ml 

When; ^ i, add and subtract ££o pjj* in eq. (31), and use Theorem 
1.11.4 together with Theorem 1.6.5 of Ref. 7 to obtain that the limit in 
eq. (31) is given, in this case, by 



(2) 
Mjj + m JJ 

2m a 



mij 
mjj 



We obtain, finally, 



2 J [EM* - (£Bi) 2 ] =nf 
a_2 «=o y-o "»« 



V /m}/ + TO// 2m, 



.(2) 



771 



//' 



771 j 



(32) 



Combining eqs. (27), (28), and (32) yields eq. (16), as was to be 
proved. H 

Proof of Theorem 8(6): From Theorem 7.5 of Ref. 5, the scale constant 
for the central limit theorem for the induced chain is the asymptotic 
variance of \n- We have 

/ n \ 2 

Var X n = £ I WI + 2E £ 1 WjW k -[E £ W k . 
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The last term on the right is equal to 

Using the conditional independence (eq. (10)), one shows that 
EW J W k = ( ! ZjPi\ EBjBk for j*k, 

and using eq. (9), one shows that 

m A m A 

EWl = £ £p,(l -p.-)£B* + J YPi^i 

Combining these and simplifying leads to 

1 m A- c b 

-Var X n=SYP.(l-/>.) £ — 

i=i A \,=i A / b=oTribb n \j=i A / \k=i 

The remainder of the proof consists in using Theorem 8(a) followed by 

algebraic manipulation. ■ 

Proof of Theorem 9: Begin by writing 

\n — JUAS„ Xn~ IM M Sn ~ 7l/A 



awn o\fn ° Vn/A 



(33) 



By Theorem 7, the distribution of the first term on the right converges 
to the standard normal cdf. The distribution of the second term 
converges to the cdf of a normal random variable having mean zero 
and variance ff/o 2 . We will show that for each n, these two terms are 
independent. 

The stochastic process B*(t) defined in the proof of Theorem 10 is 
a Markov pure jump process, and X\ = Si and Bi are independent 
because of the independence of the failure process and the arrival and 
service time processes (or use Theorem 15.28 of Ref. 11). Since the a- 
field of the TV's is contained in that of the B's, Si = Xi and xi — TVj 
are independent too. By Proposition 15.27 of Ref. 11, Si is a Markov 
time for the process, so that the process B*(t) = B*(t + Si) for t > 
is a Markov process whose initial distribution is P{Bi = b), b = 
0, • • • , c. But because of the stationarity of {B n }, P{Bi = b) = Ub, 
b = 0, • • • , c, so that B*(t ) and B*(t) are equivalent processes. Hence, 
X2 and Bi are independent, and so are X2 and W2, from which it follows 
that S2 and X2 are independent. The result for S n and x* follows by 
induction. 

It follows that the limit of the distribution of the quantity in eq. (33) 
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is the cdf of a normal random variable having mean zero and variance 
1 + n 2 /<?. Now apply Theorem 8.1 of Ref. 12 to obtain the final result. 
The sufficient condition of that theorem is satisfied, because, using the 
notation of Ref. 12, M*{n) < W„+i + /xAX„ + i with probability one. ■ 
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